QMIR 2026

Week 3: R Fundamentals – Data Types & Structures

Tristan Muno
February 19, 2026

Week 3 Goals

By the end of today, you should:

  1. Understand the core concepts: objects, functions, and pipes
  2. Know the main data types R uses to store information
  3. Know the main data structures R uses to organize information
  4. Be able to access and explore data using [] and $

Agenda

  1. Leftovers from Last Week
  2. Objects, Functions & Pipes
  3. Data Types
  4. Data Structures
  5. Accessing & Exploring Data
  6. Walkthrough & Exercises

Leftovers from Last Week

Leftovers from Last Week

  • Cloning from GitHub
    • This is how you access and submit the homeworks and take home exam
  • Creating and linking your private projects with GitHub
  • Habits for effective and efficient coding:
    • learn to use shortcuts (e.g. opening and closing Positron panels)
    • learn to use your keyboard (e.g. highlighting text, adding code cells)
    • set theme and font that makes you feel good

Part I: Objects, Functions & Pipes

R as a Language for Storing and Transforming Information

To understand […] R, two slogans are helpful:

Everything that exists is an object

Everything that happens is a function call

– John Chambers

  • You give R objects – named containers holding information
  • You apply functions – operations that transform objects
  • You chain steps together using pipes

Objects – Storing Information

An object has:

  • a name (how you refer to it)
  • a value (the information stored)
  • a type (what kind of information)

An R example:

View source
# Assign a value to a name with <-
country <- "Germany"

gdp_per_capita <- 54300

is_eu_member <- TRUE

Objects – Storing Information

View source
# print by typing the name
country
[1] "Germany"
View source
gdp_per_capita
[1] 54300
View source
is_eu_member
[1] TRUE

Functions – Doing Things with Objects

A function:

  • takes one or more inputs (arguments)
  • does something
  • returns an output

function_name(argument1, argument2, ...)

View source
gdp_values <- c(54300, 43800, 33200)

# Functions transform objects
mean(gdp_values)
[1] 43766.67
View source
sqrt(gdp_per_capita)
[1] 233.0236
View source
nchar(country)
[1] 7

Functions – Doing Things with Objects

You can write your own functions.

Functions can be:

  • small personal helpers
  • reusable analysis tools
  • shared with others

👉 Collections of useful, published functions form packages. We will expore some next week.

Packages are what make R (and Python) powerful ecosystems.

View source
# A small custom helper
add_tax <- function(price) {
  price * 1.19
}

add_tax(10)
[1] 11.9

The Pipe – Chaining Operations

Without the pipe, nested calls become hard to read:

View source
round(mean(sqrt(gdp_values)), digits = 2)
[1] 208.17

With the pipe |>, read left to right – “and then”:

View source
gdp_values |> sqrt() |> mean() |> round(digits = 2)
[1] 208.17

The pipe says: take this, and then do that.

Comments – Code as Communication

View source
# BAD: comments that just repeat the code
x <- x + 1 # add 1 to x

# GOOD: comments that explain WHY
x <- x + 1 # index starts at 0, shift to 1-based for plotting

# Use comments to:
# - explain decisions ("why log-transform? right skew in raw data")
# - mark sections of your script
# - temporarily disable code during testing

Your most important collaborator is your future self — six months from now, you will not remember why you did what you did. Thanks to our Quarto literate programming workflow, you are living in the golden age of commenting – you can literally write full prose between code cells instead of apologizing in hashtags later.

Part II: Data Types

What Kind of Information Does R Store?

Type Description Example
double Decimal numbers 3.14, 54300.5
integer Whole numbers 42, 2026
character Text / strings "Germany", "hello"
logical True or false TRUE, FALSE
factor Categories "low", "medium", "high"
Date Calendar dates 2026-02-23

Numeric: Double & Integer

View source
gdp <- 54300.5 # double (default for numbers)
seats <- 736L # integer (the L suffix makes it explicit)

class(gdp)
[1] "numeric"
View source
class(seats)
[1] "integer"
  • For most purposes in this course, the integer/double distinction doesn’t matter – R handles it behind the scenes. But it’s good to know it exists.
  • It is important that you know how to find out if objects are numeric.

Character / String

View source
capital <- 'Berlin' # single or double quotes both work
country <- "Germany" # but double quotes are sometimes easier to see

nchar(country) # how many characters?
[1] 7
View source
toupper(country) # string manipulation
[1] "GERMANY"
View source
paste(country, "is in the EU")
[1] "Germany is in the EU"

Character / String

Numbers stored as characters are not numbers – R won’t do math with them:

View source
"42" + 1 # this does not work
Error in `"42" + 1`:
! non-numeric argument to binary operator
View source
as.numeric("42") + 1 # transforming the string to numeric makes it work
[1] 43

Logical / Boolean

View source
is_eu_member <- TRUE

is_eu_member
[1] TRUE
View source
population <- 84000000

# Comparisons produce logical values
population > 50000000
[1] TRUE
View source
population == 84000000
[1] TRUE
View source
!is_eu_member # NOT
[1] FALSE

Logicals are the backbone of filtering – you’ll use them constantly when subsetting data.

Logical Operators

Operator Meaning Example
< Less than 3 < 5
<= Less than or equal 3 <= 3
> Greater than 5 > 3
>= Greater than or equal 5 >= 5
== Equal to 3 == 3
!= Not equal to 3 != 5
& AND (vectorised) (3 > 1) & (5 > 2)
\| OR (vectorised) (3 > 5) \| (5 > 2)
! NOT !(3 > 5)
%in% Is element contained in vector "Germany" %in% c("France", "Germany")

Specialized Types: Factors & Dates

Factors – categorical variables with fixed levels

View source
regime <- factor(
  c("democracy", "autocracy", "democracy", "hybrid"),
  levels = c("autocracy", "hybrid", "democracy")
)

regime
[1] democracy autocracy democracy hybrid   
Levels: autocracy hybrid democracy
View source
levels(regime)
[1] "autocracy" "hybrid"    "democracy"

Dates

View source
today <- Sys.Date()
today
[1] "2026-02-19"
View source
treaty <- as.Date("1957-03-25") # Treaty of Rome
treaty
[1] "1957-03-25"
View source
today - treaty # date arithmetic!
Time difference of 25168 days

You’ll encounter these often in real data. The key insight: R treats them differently from plain text so it can do useful things like ordering, arithmetic, and plotting.

Part III: Data Structures

How Is Information Organized?

Structure Dimensions Types allowed
Vector 1D One
Matrix 2D One
Array nD One
Data frame 2D Multiple
List Any Multiple

Vectors – The Building Block

View source
# c() combines values into a vector
countries <- c("Germany", "France", "Italy", "Spain", "Turkey")
population <- c(84, 68, 60, 47, 88) # millions
eu_member <- c(TRUE, TRUE, TRUE, TRUE, FALSE)

length(countries) # number of elements in vector
[1] 5
View source
class(countries) # check type
[1] "character"
View source
class(eu_member) # check type
[1] "logical"

Vectors – The Building Block

A vector can only hold one type. If you mix types, R silently converts:

View source
mixed <- c(1, 2, "three") # all become character!
mixed
[1] "1"     "2"     "three"

Matrices & Arrays — Briefly

View source
m <- matrix(1:6, nrow = 2, ncol = 3)
m
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
  • 2D, one type
  • Used for linear algebra behind the scenes
  • In practice: once you meet data frames, you’ll rarely use matrices directly

Data Frames – The Workhorse

  • Rectangular (rows × columns)
  • Each column is a vector (one type)
  • Columns can be different types from each other
  • Each row is an observation
View source
eu_data <- data.frame(
  country = c("Germany", "France", "Italy", "Spain", "Turkey"),
  population = c(84, 68, 60, 47, 88),
  eu_member = c(TRUE, TRUE, TRUE, TRUE, FALSE),
  regime = c("democracy", "democracy", "democracy", "democracy", "hybrid")
)

eu_data
  country population eu_member    regime
1 Germany         84      TRUE democracy
2  France         68      TRUE democracy
3   Italy         60      TRUE democracy
4   Spain         47      TRUE democracy
5  Turkey         88     FALSE    hybrid

Data Frames – The Workhorse

View source
nrow(eu_data) # number of rows (observations)
[1] 5
View source
ncol(eu_data) # number of columns (variables)
[1] 4
View source
head(eu_data) # first 5 rows of df
  country population eu_member    regime
1 Germany         84      TRUE democracy
2  France         68      TRUE democracy
3   Italy         60      TRUE democracy
4   Spain         47      TRUE democracy
5  Turkey         88     FALSE    hybrid
View source
str(eu_data)
'data.frame':   5 obs. of  4 variables:
 $ country   : chr  "Germany" "France" "Italy" "Spain" ...
 $ population: num  84 68 60 47 88
 $ eu_member : logi  TRUE TRUE TRUE TRUE FALSE
 $ regime    : chr  "democracy" "democracy" "democracy" "democracy" ...
View source
summary(eu_data)
   country            population   eu_member          regime         
 Length:5           Min.   :47.0   Mode :logical   Length:5          
 Class :character   1st Qu.:60.0   FALSE:1         Class :character  
 Mode  :character   Median :68.0   TRUE :4         Mode  :character  
                    Mean   :69.4                                     
                    3rd Qu.:84.0                                     
                    Max.   :88.0                                     

Lists – The Flexible Container

View source
country_info <- list(
  name = "Germany",
  population = 84000000,
  eu_member = TRUE,
  parties = c("SPD", "CDU", "Greens", "FDP")
)

country_info
$name
[1] "Germany"

$population
[1] 8.4e+07

$eu_member
[1] TRUE

$parties
[1] "SPD"    "CDU"    "Greens" "FDP"   

Lists can hold anything – including other lists. They’re less intuitive than data frames, but you’ll encounter them constantly as outputs of statistical models.

Part IV: Accessing & Exploring Data

Reaching Inside Structures

Two main tools:

  • [ ] – index by position or condition
  • $ – index by name (for data frames and lists)

Indexing Vectors with [ ]

View source
countries <- c("Germany", "France", "Italy", "Spain", "Poland")

countries[1] # first element
[1] "Germany"
View source
countries[c(1, 3)] # first and third
[1] "Germany" "Italy"  
View source
countries[-2] # everything except second
[1] "Germany" "Italy"   "Spain"   "Poland" 
View source
population <- c(84, 68, 60, 47, 38)

# Logical indexing — filter by condition
countries[population > 50]
[1] "Germany" "France"  "Italy"  

Indexing DFs with [ ] and $

View source
# [row, column] -- leave blank to mean "all"
eu_data[1, ] # first row, all columns
  country population eu_member    regime
1 Germany         84      TRUE democracy
View source
eu_data[, 2] # all rows, second column
[1] 84 68 60 47 88
View source
eu_data[1, 2] # specific cell
[1] 84
View source
# $ accesses a column by name -- returns a vector
eu_data$country
[1] "Germany" "France"  "Italy"   "Spain"   "Turkey" 
View source
eu_data$population
[1] 84 68 60 47 88
View source
# Combine: filter rows by condition
eu_data[eu_data$population > 65, ]
  country population eu_member    regime
1 Germany         84      TRUE democracy
2  France         68      TRUE democracy
5  Turkey         88     FALSE    hybrid

Exploring Data – Your First Questions

View source
str(eu_data) # structure: types, dimensions, preview
'data.frame':   5 obs. of  4 variables:
 $ country   : chr  "Germany" "France" "Italy" "Spain" ...
 $ population: num  84 68 60 47 88
 $ eu_member : logi  TRUE TRUE TRUE TRUE FALSE
 $ regime    : chr  "democracy" "democracy" "democracy" "democracy" ...

Exploring Data – Your First Questions

View source
summary(eu_data) # summary statistics per column
   country            population   eu_member          regime         
 Length:5           Min.   :47.0   Mode :logical   Length:5          
 Class :character   1st Qu.:60.0   FALSE:1         Class :character  
 Mode  :character   Median :68.0   TRUE :4         Mode  :character  
                    Mean   :69.4                                     
                    3rd Qu.:84.0                                     
                    Max.   :88.0                                     

Exploring Data – Your First Questions

View source
head(eu_data, 2) # first n rows
  country population eu_member    regime
1 Germany         84      TRUE democracy
2  France         68      TRUE democracy
View source
class(eu_data) # what kind of object is this?
[1] "data.frame"
View source
class(eu_data$population) # what class/type is this variable?
[1] "numeric"

These are some of the first things you should run whenever you encounter a new dataset.

Time to Practice!

Walkthrough Overview

  • Clone the Week 3 exercise repo from GitHub (same workflow as last week)
  • Open exercise_week03.qmd in Positron
  • Work through the tasks in order
  • Fast finishers: optional depth exercises at the bottom of the document

Exercise Structure

Core tasks (everyone):

  1. Create vectors of different types and check their class
  2. Build a small data frame from scratch – use a political science example
  3. Subset with [ ] and $
  4. Write clean, commented code throughout

Optional depth exercises (fast movers):

  • Working with factors: level ordering and table()
  • Date arithmetic with as.Date() and lubridate

Homework

  • Distributed via GitHub (same workflow as Week 2)
  • Submit a rendered Quarto document (.qmd + rendered PDF)
  • Due: one week from today

The homework follows the same arc as today – you’ll work with a small political dataset to practice types, structures, and subsetting. Read the task descriptions carefully and write comments explaining your reasoning.

Thank you for your attention and see you next week!